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APPLICATION  OP  A SIMPLIFIED  METHOD  6P  GRAPH  1(5  CURVILINEAR  CORRELATION 

.vf.V.i  *v^V*H*^  «<ilf  ;,7  :V  ;l : 1 

: : i By  L‘v  H.fiBcafii''SGni:or  :Aigriculturalr- Economist,  .Division  of 
Statistical  and  Historical  Research,  Bureau  *,of-  Agricultural  Economics 

« •V- i-  ■; pv-v  '••/«  ;•  4 y>:r..y\ 

The  practicing  statistic i&n  and  economist  who  ‘is  frequently  called 
upon  to  de  terifli  ne  the. : qUairt  i tat  ive  ; reiat  ionsh  ips  he  tweerl  two  o r mo  re  f ac  to  rs 
often  finds  it  inconvenient  or  undesirable  to  apply  the  formal  technique  of 
multiple  curvilinear  correlation  as  described  in  the  Journal  of  the  American 
Statistical  -Assoc iation.i': a.'.;' 'Time  ■ and  clerical  hel*p  are  often  lacking  or  in- 
sufficient  data  do; -not  ^arrant  the'  iiS'e  df  • the  -‘fohital  technique.  It  is  the 
writer Ts  experience,.  shared  undoubtedly' by  Others  who  have  seriously  attempted 
analysis  of  time  series  or*-  ’other  problems  involving  - small  numbers  of  observa- 
tions, (30  or  less),  that  it  is  often  possible  to  resort  to  simplified  methods 
of  multiple  correlation  requiting  little' time  or  labor 'and  yielding  results  of 
c onsiderable  pract  ical  Valu$.  ''  : : 

The  purpose  of  this  paper  is  to  present  this  simple  approach  to  multiple 
curvilinear  correlation.  The  method  employed  will  be  demonstrated  with  six 
.examples,  or  cases,  of  actual  problems 'chosen' from  different  fields  of  economic 
relationships,  in- the  hope  that  this  ],casen  method  of  presentation  will  not  only 
make  clear  the  simple  steps  involved,  but  will  also  suggest  their  application 
to  - similar  problems  likely  to  be  encountered  by  the  analyst  of  variations  in 
economic  data.  It  will  be  demonstrated  by  means  of  a generalized  problem,  but 
in  this  final  illustration  also  we  shall  refrain  from  generalization. 

Technical  language  will  be  used  as'- little  as  possible,  but  the  reader  will  need 
to  study  closely  the  • graphic  presentations , for  the  method  is  essentially  one 
of  graphic  correlation.  ; In  each'  example  the  data  used  and  the  steps  in  the 
analysis  will  be  so  indicated  that  the  reader  may  properly  appraise, the 
reasonableness  of  the  approach  and  the  reliability  of  the  results.-^/  The 
assumptions  and  logic  involved  Ih  each  of ’the  six  special  cases  will  also  be 
indicated  but  this  only  briefly,  for  we  are  concerned' here  primarily  with  the 
simple  method  of  correlating  certain  factors  and  not  With  the  reasons  for  select- 
ing the  factors  used. 


To  the  reader • acquainted  with  the  formal  method  of  multiple  curvilinear 
correlation  it  ma y be  of  interest  to  observe  at  the  outset  that  the  methods  used 
in  the  examples  are  not  unlike  those  now  in  use.  The  formal  method  involves  (l) 
multiple  linear  correlations  to  determine  a first  approximation  to  the  net  effect 
of  each  independent  factor  on  the  dependent-  one,  (2)  computation  of  residuals  or 
differences  between  the  values  of  the  independent  variable  and  values  estimated 
jfrpm  the  linear  regressions,  (3)  plotting  the  residuals  as  deviations  from  each 
of  the  linear  regression  curves,  and  then  (4)  the  reduction  of  the  residuals  to 
a minimum  by  a process  of  successive  approximations  which  involves  the  free  hand 
drawing  of  curvilinear  regression  lines.  In  the  approach  illustrated  here, 
steps  (l) (2),  and  (3)  are  not  used.  In  their  stead  one  or  more  simple  scatter 
diagrams-  are  used,  first  approximation  to  curves  drawn  free  hand  by  inspection, 
.and  residuals  usually  reduced  to  a minimum  by  subtracting  first  the  effect  of 
one  variable  and  then  of  another.  The  first  approximation  curves  are  then  ao- 


1/  See  Journal  American  Statistical  Association, Vol. XVIII, No.  144, A Method  of 
Handling  Multiple  .Correlation  Problems,  by  H . R.  Tolley  and  M.J. 3. Ezekiel  and 
Vol.^  XIX,  .No.  148,  A Method  of  Handling  Curvilinear  Correlation  for  any  Number 
of- Variables,.,  by  M.  J.:  3.  Ezekiel.  - 


adjusted,  if;  •necessary1,:  with  •refeferice  to  the^residu^  variation  until  no 
further  changes  are  abdicated.  This  simple  approach  will  he  illustrated  by 
the  following  six  examples. 


The  first  case,  or  example*  involves  three  variables,  one  dependent 
and  two  independent,  • where  the  effect' jof  the  second  is , first -removed,  and 
the  residuals  practically  entirely  explained  hv  the  .third.  , The  example 
deals  with  potato.- prices*  ’’  " " ••  ■ 


The  second  case  also  involves  three  variables  and  is  like  case  I 
except  that  the  third  variable  is  first  adjusted  for  trend  before  it  is. 
used  to  explain  the  residuals  derived  from  the  relationship  between  £he 
first  two  variables.  The  example  deals  with  mill  consumption  of  cotton. 


The  third  case  involves  four  variables; ‘ the  fourth  of  which  is  a 
composite  of  ’'other”  factors  represented  by  a regular  trend  in  residuals. 
This  example  deals  with  the  cotton  consumption  data  of  case  II. 


The  fourth  case  involves  four  variables,  in  which  the  fourth  is  a 
composite  of,  "other"  factors  represented  by  an  irregular  trend  in  residuals. 
The  example,  deals  with  the  yearly  average  price  of  apples. 

•'<  ' ' • . * * ■ ••  ‘f  ; ■ 

The  fifth  case  dealing  with  orange  prices  Involves  five  variables, 
two  of  which  are  highly  interc or related. 


In  the  sixth  case,  dealing  with  acreage  changes,  the  simple  approach 
is  applied  to  a correlation  problem  in  three  variables,  with  one  dependent 
variable  expressed  as  relative  first- differences,  or  percentage  changes 
from  one  period  to  the  next. 


In  the  seventh  case,  the  simple  method  is  applied  to  a general  pro- 
blem of  4 variables,  30  observations;  and  high  inter-correlations  between 
each  of  the  variables* 


After  these  cases  have  been  presented,  something  will  be  said  in 
conclusion  concerning  the  limitations  of  the  methods  used  here,  the  re- 
sults compared  with  those  obtainable  by  the  methods  now  in  use,  and  the 
significance  that  may  be  attached  to  results  from  the  analysis  of 
relatively  short  time  series. 

case  i • ; ' 

Relation  of  (1)  production  of  early  potatoes  and  (2)  the  price  of 

old  potatoes  to  (3)  the  price  received  by  producers  of  early  potatoes 

This  case  deals  with  three  variables,  the  first  and  second,  of- which 
are  almost  perfectly  correlated  with  the  third.  It  illustrates  the  method 
of  determining  by  inspection  the  net  relation  between  the  first  variable 
and  the  third  (dependent)  and  then  the  relation  between. the  second  variable 
and  the  residual  fluctuations  in  the  dependent  not  explained  by  the  first. 

The  data  and  the  steps  involved  in  this  analysis  are  all  contained 
in  Figure  1.  Disregarding  for  k the  moment  .the  solid  curve  in  section  I,  we  ' 
have  here  a scatter  diagram 'in  which  the  production  of  early  potatoes  for 
the  period  1921-1928  is  plotted  against  the  price  received  by  producers. 
These  prices  are  shown  in  section  4.  The  price  of  old  potatoes  is  plotted 
in  section  3.  ■ .. 


the  effect  of  early  potato  production  and  the  price  of  old  potatoes 

on  the  Growers  price  of  early  potatoes 


Starting,  with  these  two  price  series  ahd  the  scatter  diagram,  our  aim  is  to 
establish  first  the  effect  of  production  on  the  price  received  by  producers. 

In  other  words  we!  wish  rto  draw  la  curve  thro  ugh  the  observations  in  section  I, 
whic^h  will  represent ’the  net  effect'  of  production  alone,  with  the.  effect  of 
the.  price  of  oi  d/p© tato.es.'  held-5  eons  taht.  ■ * 

By  inspecting  the  movements  of  prices  of  old  potatoes,  it  is  -observed 
.that  the  prices . in  1.923..  and  1925'  we'rb  the- same  or  nearly  so,  that,  is,  in  these 
two  years  the  effect-of  old  p6t  a to- prices  : may  be  assumed  .t.o  have ‘been  equal 
or  constant,-  so  that  rthb  difference"1  in  the  prices  received  by  growers  of  early 
potatoes  may  be ■ tentatively  hsSUmed  to' be  due  to  variations  in  production. 

By  connecting^  the  observations  -'in  -:-the  scatter’  diagram,  the  apparent  effect  of 
production  in  these  two  years  of  'constant  old  potato  prices  is  indicated. 

We  may  now  make  the  further ''Ohs  Creation ’•'that  prices  of  old  potatoes  in  1922, 
1924,  and  1928  were  about  the  same,  suggesting  that  their  effect  on  the  prices 
received  was  probably  constant  in  these  three  years.  By  drawing  a curve  which 
will  pass  through  the  three  corresponding  observations  .in  the  scatter  diagram, 
we  obtain  the  effect  of  pro  duct  iPn-'lndependent  of*  old  potato  prices  in  these 
three  years,  '•  *•  o 

We  may  now  make  a final  observation  that  old  potato  prices  were  high  in 

1926  and  1327  and  low  in  1921,  ' 1923, and' 1925,  which  suggests  that  a curve  may 

be  so  drawn  through  the  observations  in  the  diagram  which  will  leave  the  1921, 
1923,  and  1925  observations  below  the  curve  and  those  of  1926  and  1927  above, 
and  parallel  the  previous  short  segment  si-  (The  curve  shown  in  the  . diagram . 
was  so  drawn,)  This  gives  lus  a' tentative  indication  of  the  .net  effect  of  pro- 
duction on  the  price  received.  " ’ • ••''  •;  " 

. • . Our  next -step  is  to  see  to  what  extent  the  deviations  from  this  average 
supply,  and  price  -curve  can- be'  explained  by  the  fluctuations  in  the  price  of  old 
potatoes.  These -deviations ,:  which-  are'  the  portions  of  the  price  not  explained 
by • production,  may  be  measured  or  read  graphically  directly  from  the  diagram, 
and  are  shown  plotted  in  sect  ion  3,  The  quantitative  relationship  between  the 
price  of  old  potatoes  and  the  deviations  from  the  supply  and  price  curve  is 
shown  in  section  2;where  the  May  prices  of  old  potatoes  are  plotted  against 
the  deviations.  By  slight  adjustments  in  the  preliminary  supply  and  price 
curve  it  becomes  evident  that  the  observations  in  the  scatter  diagram  of  section 
2 lie  along  a curve  which  may  be  drawn  in  free  hand.  Inasmuch  as  there  are 
only  minor  deviations  from  this  second  curve  it  is  clear  that  ‘these  two  factors 
(supply  of  early  potatoes  and  price  of  old  potatoes)  account  for  practically 
all  of  the  variations  in  the  price  received  by  producers  of  early  potatoes. 

. , The  .extent  to  which  that  is  true  can  be  demonstrated  by  reading  from 
the  supply  and  price  curvu  for  the  production  of  each  year,  the  average  effect 
of  production  on  price,  and  adding  to  it  or  subtracting  from  it  the  readings 
from  the  second  curve  of  the  average  effect  of  old  potato  prices  on  early. 

The  algebraic  sums  of  these  two  readings  for  the  corresponding  years  as  well  as 
the  actual  prices  are  shown  in  section  4.  * 

It  will  oe -observed  that  the  almost' perfect  correlation  shown  in  the 
comparison  oetween  the  actual  and  the  estimated  prices  in  section  4 was  ob- 
tained merely  by  (l)  plotting  the  data  used  in  the  analysis,  (2)  drawing  by 
•inspection  a free  hand  net  regression  curve  of  supply  and  price,  (3)  plotting 
deviations  read  from  this  curve  (which  deviations  may  be  considered  as  the 
original  prices  with  the  effect  of  production  eliminated)  against  the  second 
independent  factor  (t]fc£h  price  of  old  potatoes)  and  (4)  drawing  a free  hand 

- 3 - • ; 


curve  through-  'th:i's;'  'S'^corid  - scatter  r'cLiagham^ .Aridther  step  that  should  he,  enn 
ployed  unless  the  residual  variation  Is  practicality  p’erpr.  as  it  is  in  r 
this  case  is  to  plot  the  final  residuals  as  deviations  from  the  two  curves 
as  a final  test  of  goodness  of  fit,  .. 


The  other  examples  which  follow- .are  in  a large  measure  only  variations 
from  this  simple  case.  They  all  involve  determining  curves  by.  inspection, 
eliminating  the  effect  of  one  Variable  from  the  original  dependent,  and 
eliminating  the  effect  of  the  second  arid  third  from  the  residuals,  thus  re- 
ducing the  latter  to  a 'minimum.  The  last  step,  checking  the  final  slope  o 
the  .curves  with  reference  to  the  final  residuals,  was  not  deemed  necessary 
in  the  following  illustrations  in  view  of  the  very  snail  final  residuals, 

'=  V~.  f-,v  ■ CASE  II  ' • ^.V  * 

■ Relation  of  cotton  prices  and  "business  conditions  to  the  ^ 
domestic  mill  consumption  of  cotton® 

This  case  varies  from  case  X only  in  that  one  of  the  independent 
variables  shows  a very  definite-  upward  trend,  hut  the  same  method  of  correla- 
tion may  be  employed  if  a simple  adjustment  for  trend,  is  used. 

Two  main  assumptions  are  involved  in  this  analysis  of  cotton  consunp- 
tion.  : First,  variations  in  domestic  mill  consumption  of  cotton  are  due  very 
largely  to  the  price  of  cotton  in  relation  to  the  price  of  cotton  goods  and 
to  changes  in  business  activity.  Low  prices  due  to  large  cotton  crops  crea  e 
favorable  price  margins  for  manufacturers.  Under  such  conditions  cotton  is. 
purchased  beyond  the  current  requirements  for  later  consumption;  conversely, 
high  cotton  prices  create  unfavorable  profit  margins,  and  curtailment  of 
purchases  of  raw  cotton  is  reflected  in  subsequent  curtailment  in  cotton. mill 
activity.  Thus  it  appears  that  variations  'in  price  of  cotton  during  a given 
crop  year  (july-june)  are  reflected  in  mill  consumption  during  the.  following 
calendar  year  ‘(indicating  roughly  a lag  of  about  six  months  if  semi-annual  or 
monthly  data' were  used,)'  The  second  assumption  is  that  co.t ton  mill  consump- 
tion is  also  affected  by  general  business  conditions  which  reflect  the  in- 
dustrial  demand  for  cotton  and  the  buying  power  of  consumers  in  .their  deman 
for  cotton  goods, 


Figure  2 coit  ains  the  following:  in  section  4 the  index  of  cotton  con- 

sumption 1919  to  1928,  and  un  section  3 the  general  index  of  production  of 
manufactures , both  published  by  the  Federal  Reserve  Board  (1923-25  ; 100) : 

in  section  1 the  New  Orleans  crop  year  average  priced/  of  middling  spot  cotoon 
plotted  against  the  index  of  consumption  in  the  form  of  a scatter  diagram. 

The  index  of  manufactures  is  here  used  as  a general  measure  of  business  activ- 
ity. If  the  curve  in  the  scatter  diagram  be  disregarded  for  the  moment  it  will 
be  evident  that  the  scatter  is  wide  ahd  that  no  appreciable  correlation  between 
price  and  consumption  is  apparent  on  the  surface.  However,*  ^he  nature  of  the 
effect  of  price  on  consumption  becomes  evident  after  a moment 5 s inspection 
of  the  index  of  business  ' activity. 

We  note  first  that  the  index  of  business  activity  has. an  upward  trend. 
This  suggests' studying  the  variations  in  this  index  above  and  below  a trend  and 

3/  Adjusted  for  ■ changes  in  the  Bureau  of  Labor  statistics  index  of  wholesale 
prices,  1926  = 100. 
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The  Effect  of  cotton  prices  and  Business  activity  on 
Domestic  cotton  (mill)  consumption 
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DEVIATIONS,  BUSINESS  ACTIVITY  FROM  TREND 


i " 


relating  them  to  the  position  of  the  comparable  observations  in  the  scatter 
diagram  • - Disregarding  for  the  moment  the  trend  lines  finally  used 

here,  we  note  that  each  of  the  years  1923  1925,  1926,  and  1928  were  years 
of  high  business  activity,  as  indicated  by  the  dotted  upper  line*  We  then 
observe  the  location  of  the  corresponding  points  in  the  scatter  diagram  of 
consumption  against  price, , and  draw ‘a  line  through  , them.  Next  we  note 
that  the  indexws  of  business  activity  for  the  yeas  1919,  1920,  1922,  and 
1924  lie  on  a lower  trend  lihe  with-1921  considerably  lower.  Again  we 
fina  the  price  consumption  points  ’for  these  years  in  the  lower  part  of  the 
diagram,  with  the  observation'  for  1921  and  1924,  the  years  of  greatest  busi- 
ness depression  lower  than  the  rest.  Our  problem  now  is  to  draw  a free- 
hand, trend  line  through  the  index  of  business  activity  and  to  draw  a corres- 
ponding curve  through  the  scatter  diagram,  so  that  for  each  of  the  devia- 
tions from  the  trend  line  there  will  be  a corresponding  deviation  from  the 
price-consumption  curve.  A straight  line  drawn  in,  approximately  midway 
e tween  the  tentati^mipper  and  lower  lines  through  the  indexes  of  business 
activity,  passes  through  the  index  for  1927.  This  suggests  that  a curve  x 
ma;y  also  be  drawn  between  the  upper  and  lower  tentative  lines  in  the 
scatter  diagram,  also  passing  through  the  H1927»»  point.  Having  drawn  in 
hese  two  center  lines,  their  adequacy  is  tested  by  plotting  the  deviations 
A f°m  trend  in  business  activity  against  the  deviations  from  the  consump- 
tion-price curve  (see  section  2.)  After  very  slight  adjustments  in  the 
curve  and  trend  line,  it  is  found  that  the  relationship,  between  the  devia- 
10ns  are  best  represented, _ by  a straight  line. 

The  completeness  with  which  the  curves  thus  derived  (l,,.the  effect 
of  price,  and  II,  the  effect  of  variations  in  business  activity)  explain 
ne .variations  in  cotton  consumpt ion  appears  in  section  4,  where  the  alge**. 
braic  sum  of  the  readings  or  estimates  are  shown  in  a broken  line. 

It  is  to  be  observed  that  in  other  cases  of  this  sort,  an  adjustment 
or  a downward  trend  in  one  of  the  variables  may  be  necessary. 

. CASE  III 

Cotton  Consumption  (Continued)  - . 

In  this  third  case  we  have  an  illustration  of  a four  variable  problem 
to  w ,ch  the  fourth  variable  is  a regular  trend  in  residuals  a.fter  eliminat- 
°n.9  e^-^ec^s  two  other  variables,  and  may  be  considered  as  a composite 
o other1’  factors  not  included  in  the  problem  but  related  to  the  progress 
in  time.  The  analysis  used  in  the  second  illustration  lends  itself  also 
as  an  illustration  here"if  we  adopt' a slight  modification  of  the  foregoing 
procedure.  * ’ 

Thus,  instead  of  assuming  a trend  from  which  to  measure  deviations  in 
us i ness  activity  we  may  relate  the  actual  index  to  the  deviations  from  the 
price-consumption  curve  and  obtain  a second  set  of  residuals.  That  pro- 
cedure gives  the  scatter  diagram  in  section  5,  Figure  3 (in  place  of  section 
3,  figure  2.  ) At  first  glance  it  appears  that  the  relationship  between  the 

4/  For.  a criticism  of  trend  elimination,  see  Journal  American  Statistical 
Association,  Vol,  XX,  note  on  Error  in  Eliminating  Secular  Trend  and 
Seasonal  Variation  Before  Correlating  Time  Series,  by  Bradford  3.  Smith. 


index  and  the  price-consumption  deviations  is  a curvilinear  orie-,  calling  for 
a curve  drawn  from  the  lower  left  corner  of  the  diagram,  tapering  off  into 
the  upper  right  corner.  But  hy  following  the  observations  in  time  sequence, 
as  indicated  by  the  dotted  line  beginning  with  1919,  it  becomes  obvious  that 
the'  net  effect  of  business  activity  on  the  price-consumption  deviations  is 
best  represented  by  a straight  line  ,.§/  This  straight  line,  drawn  parallel 
to  the  lines  for  successive  short  periods,  has  a slope  such  that  a 20  point 
change  in  the  index  of  business  (from  100  to  80 ).  is  accompanied  by  a change 
in  the  index  of  consumption  of  30  points.  It  may  here  be  observed  that  this 
slope,  determined  independently  is  identical  with  the  one  in  Figure  2 uhere 
a deviation  from  trend  of  20  points  is  accompanied  by  a deviation  in  the 
consumption  index  of  30  points. 

If  we  now  proceed  to  plot  observations  in  section  5,  Figure  3 as 
deviations  from  curve  Til,  we  find  that  they  show  a downward  trend  and  fall 
along  a straight  line,  which  is  accordingly  drawn  in  as  in  section  6*. 

The  procedure  in  this  third  illustration,  it  should  be  clear,  was  first! 
to  eliminate  the  e'ffect  of  one  variable  (price)  on  the  dependent  (cotton  con- 
sumption) by  measuring  residuals  from  the  price  consumption  curve  determined 
by  inspection.  We  next  eliminated  the  effect  of  another  variable  (business 
e.ctivity)  from  these  first  residuals  by  measuring  second  residuals  from  curve 
III,  also  determined  by  inspection,  and  by  considering  time  as  another  com- 
posite variable  it  almost  completely  explains  the  final  residuals  in  the  form 
of  a downward  straight  line  trend.  Readings  from  I,  III,  and  IV  summed 
algebraic. ally  give  practically  the  same  estimates  of  consumption -as  do  read- 
ings from  I and  II.  Obviously  case  II  is  simpler  than  case  III,  but  the 
latter  is  intended  mainly  as  an  illugt  nation  of  successive  reduction  of 
residuals  to  a minimum.  In  other  cases  of  this  type  the  second  set  of 
residuals  may  fall  .along  a uniform  upward  trend.!/ 

For  a proper  interpretat ion  of  the  downward  trend  in  residuals  obtained 
in  case  III  we  related  the  variations  as  well  as  the  growth  in  business 
activity  to  the  variations  in  consumption  not  explained  by  price.  The  down- 
ward trend  in  residuals  therefore  is  due  to  the  upward  trend  in  business 
activity  and  reflects  in  part  the  fact  that  the  relation  of  cotton  consumption 
to  business  activity  has  been  changing  with  passing  years,  since  business 


5/  This  is  obvious  from  the  fact  that  the  straight  lines  of  best  fit  for 
any  set  of  three  observations  <are  all  practically  parallel.  See,  for 
instance,  1920-21-22;  1922-23-24;  1923-24-25. 


&_/  For  the  third  method  of  handling  the  data  in  case  II  as  well  as  for  an 
illustration  of  an  upward  trend  in  residuals,  see  "Some  interrelationships 
between  the  supply,  price  and  consumption,  of,  cotton"  by  L.  H»  Bean,  paper  read 
before  the  New  York  Section  American  Statistical  Association,  April,  1928*. 
Here  the  index  of  cotton  consumption  was  first  divided  by-  the  .index  of  busi- 
ness activity.  Consumption  adjusted  for  general  business  activity  was  then 
plotted  against  price  .adjusted  for  tho,  general  commodity  price  level  and^a 
downward  trend  in  deviations  established  from  a free  hand  price-consumption 
curve.  In  recomputing  or  estimating  consumption  from  these _ factors  the  sum 
of  readings  from  the  price-consumption  curve  and  the  downward  trend  in 
residuals  were  multiplied  by  the  index  of  business  activity. 
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activity,  in  the  Unit.ed  States  has  expanded  more  repidly  than  has  cotton  con- 
sumption during  the  teh-year  peridd  1919-1928. 

CASE  IV 

‘ Effect  of  supply  and  other,  factors 
on  the  yearly  average  farm  price  of  apples 

The. fourth  example  illustrates  a simple  procedure  that  may  he  followed 
in  instances  where  the  final  residuals  follow  on  irregular  trend,  instead 
of  a straight  upward  or  downward  line.  As  in  the  case  of  a regular 'straight 
line  trend  in  residuals,  an  irregular  one  may  also  he  considered  as  the  com- 
posite effect  of  ’’other”  factors  not.  included  in  the  analysis,  hut  related 
tot ime . 

This  example  deals  with  the  yearly  farm  price  of  apples.  The  inde- 
pendent variables  are  the  total  supply  of  a.pples  in  the  United  States,  the 
general  level  of  food  prices  at  wholesale,  and  ’’other”  factors  represented 
by  the  trend  in  residuals. 

The  procedure  followed  in  this  case  is.  first  to  ’’eliminate  the  effect 
of  the  general  food  price  level  by  dividing  the  series  of  apple  prices  from 
1910  to  1927,  by  a food  price  index,  on  the  assumption  that  the  price  of 
apples  usually  fluctuates  with' the  major  movements  of  food  prices  in 
general.  The  next  step  is  to  plot  in  the  scatter  diagram  of  section  1, 

Figure  4,  the  prices  of  apples  in  terms  of  1926  food  dollars  against  total 
production,  and  by  inspection  to  determine  the  net  effect  of  total  supply  on 
the  adjusted  price. 

This  curve,  shown  in  the  diagram,  is  the  result  of  noting  that  a simple 
curve  ”fifcs”  the  pre-war  observations  (excepting  1911  as  an  unusual  year), 
that  a similar  curve  somewhat  higher,  fits  the  observations  for  1920-1923, 
and  also  (somewhat  lower)  the  observations  for  1925-1927.  jfhese  tentative 
curves  further  indicate  that  an  ave rage  supply  and  price  curve  drawn  through 
the  scatter  diagram  would  reveal  considerable  negative  deviations  for  1916- 
1918,  positive  deviations  for  1920-1923,  and  a downward  trend  in  residuals 
since  1921.  These  residuals  are  shown  in  section  2,  Figure  4,  through  which 
an  irregular  trend  has  been  drawn  - a trend  which  dips  down  sharply  in  the 
war  years  when  the  apple  Export  market  was  completely  shut  off,  rises  sharply 
to  1921  probably  as  a result  of  a recovery  in  foreign  demand,  and  declines 
since  then,  which  may  be  attributed  to  increasing  domestic  competition  from 
other  fruits. 

For  our  present  purpose  we  are  not  so  much  concerned  with  the  various 
factors  .which  may  be  included  as  an  explanation  of  these  price  residuals  not 
accounted  for  by  total  supply  and  the  general  level  of  food  prices  as  we 
arb  with  obtaining  the  nature  of  the  trend  in  the  composite  effect  of  all 
’’other”  factors  on  the  yearly  price  of  apples.  This  trend,  even  though  it 
appears  irregular  when  the  entire  18-year  period,  1910-1927,  is  considered, 
indicates  sufficient  regularity  during  the  last  seven  years  to  make  an 
analysis  of  this  sort  as  useful  as  those  already  presented. 

Now  that  we  have  two  curves,  one  representing  the  effect  of  supply 
and  another  the  effect  of  all  other  factors  associated  with  time,  (except 
that  represented  by  general  food  prices)  the  third  step  is  to  express  in  the 
form  of  a third  curve  the  one-to-one  relationship  between  the  index  of  food 
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prices  and  apple  prices  originally'  assumed  in  dividing  apple  prices  |by'  ..the 
index,'  Curves  I and  II  almost;;  cpi^letoly.  account  for  the  price  of  apples 
in  terms  of  1926  dollars.  Consequently,  differences  "between  readings  from 
these  two  curves  and  the  actual  pxice.s  in  current  dollars  may  he  taken  as 
the  portions  of  price  originally  attributed  go  the  factors  represented  by 
the  general  food  price  level,.  The. actual  prices,  when  divided  by_./the  sum 
of  readings  from  curves  I and  IX,  may  therefore  be  plotted  against  the  food 
price  index,  as  infection  3,  These  observations,  excepting  two,  lie 
practically  along  a straight  line  (as  was  assumed)  which  may  now  be  used  in 
conjunction  with  the  other  curves  to  obtain  price  estimates.  In  section  4, 
the  actual  prices  are  compared  with  those  obtained  by  readings  from  curves  1 
and  II,  multiplied  by  readings  from  curve  III, 

CASE  V 

Effect  of  supply  and  other  factors  on 
the  New  York  price  of  California  oranges 

This  example  illustrates  the  application  of  the  methods  already  de- 
scribed to  a problem  in  five  variables  which  involves  intercorrelation  between 
independent  variables.  It  deals  with  the  New  York  price  of  California 
oranges  as  the  dependent  variable  and  the  total  production  of  oranges,  the 
production  of  competing  fruits,  the  general  level  of  food  prices,  and  factors 
related  to  time,  as  the  independent  variables. 

The  steps  involved  in  this  analysis  are  similar  to  those  already 
described  but  an  additional  one  is  employed  here  to  eliminate  the  inter- 
correlation that  exists  between  the  production  of  granges  and  the  production 
of  other  fruits.  They  need  only  brief  comment.  In  the  other  problems, 
this  intercorrelation  was  eliminated  by  selecting" observations  constant  as 
to  one  factor,  in  determining  the  first  approximation  to  the  net  regression 
curve  for  a second-vfactor. 

In  section  I of  Figure  5 the  total  production  of  oranges  in  the  United 
States  is  plotted  against  the  price  of  oranges  (November-October) , adjusted 
for  changes  in  the  index  of  food  prices  at  wholesale,  and  the  curve  represent- 
ing the  effect  of  production  on  price  drawn  in,  as  suggested  by  the  tentative 
curves  passing’  through  the  observations  1920-22,  and  1923-27..  The  location 
of  these  two  lines  indicate  an  upward  trend  in  deviations  from  an  average 
curve  for  the  entire  period. 

In  section  4 are  plotted  the  United  States  production  of  oranges 
against  an  index  of  production  of  competing  fruits  which  appear  on  the.marke 
during  the  crop  year  for  oranges,  November-October.  The  intercorrifelation 
is  such  that  large  crops  of  oranges  are  usually  accompanied  by  large  crops 
of  other  fruits  in  the  aggregate,  and  vice  versa;  also  year  to  year  changes 
in  the  orange  crop  are  generally  accompanied  by  similar  changes  In  the  com- 
posite of  competing  production. 

Before  attempting  to  explain  the  deviation  from  I in  terms  of  oranges, 
because  of  competing  production,  it  is  desired  to  exclude  from  the  latter 
the  changes  in  orange  production  which  are  already  taken  into  account  in 
section  I,  that  is,  the  effect  of  competing  products  which  are  attribute  0 
oranges  in  section  I.  We  may  do  this  by  drawing  curve  IY,  the  slope  o w ic 
is  _suggeste_d  by_  the  lines  pas s ing^  thrpu^h_  the_  points 1921 »_  1922, _ 1_  25, _ _ _ _ 

7 / Division  instead  of  subtraction  because  the  first  step  was  a division  of 
the  actual  price  by  the  index.  - 8 - 


EFFECT  OF  SUPPLY  AND  OTHER  FACTORS  ON  THE 
NEW  YORK  PRICE  OF  CALIFORNIA  ORANGES 
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figure  5 


Changes  in  u.S.  cotton  acreage,  1920-1928 

CHANGES  MILLION  CENTS 


and  1920,  1927,  1923.  The  deviations  from  this  average  relation  between 
orange  production  and  production  of  competing  fruits  may  now  he  plotted 
against  the  price  deviations  from  I.  This  is  done  in  section  2. 

In  section  2 the- location  of  observations  for  1923  and  1922  and  for  1924, 
1925,  1926,  and  1927,  indicates  that  the  net  effect  of  variations  in  com- 
peting production  on  the  price  deviations  is  a downward  curve  similar  to  the 
one  shown.  Study  of  the  location  of  observations  in  relation  to  this  curve 
or  a similar  curve  in  any  other  vertical  position  again  reveals  an  upward 
trend  in  deviations  which  we  may  now  plot  in  section  5 and  pass  through  them 
a smooth  upward  sloping  curve.  As  in  the  other  cases,  or  examples,  residuals 
from  y should  be  plotted  against  each  of  the  other  curves  to  see  if  any 
changes  are  needed  in  their  net  shapes. 


Thus  we  have  in  curves  I,  II  and  V (as  finally  modified,  if  necessary) 
practically  a complete  explanation  of  the  price  of  oranges  in  terms  of  1926 
dollars.  The  final  step  in  this  analysis  is  to  obtain  estimates  of  orange 
prices  which  we  may  compare  with  the  actual  prices.  Inasmuch  as  the  actual 
prices  were  divided  by  the  food  index,  we  desire  to  express  graphically  the 
assumed  relationship  of  the  food  price  index  to  orange  prices.  As  in  the 
preceding  example,  this  is  obtained  by  plotting  the  index  against  ratios 
obtained  by  dividing  the  actual  price  by  the  sum  of  readings  from  curves  I, 

II  and  V.  These  readings,  it  should  be  clear,  may  be  taken  as  explaining 
tne  variations  in  orange  prices  due  to  all  factors  here  dealt  with  other  than 
tnose  represented  by  changes  in  the  food  price  level.  The  sum  of  the  read- 
ings from  I,  IX  and  V when  multiplied  by  readings  from  this  curve  III  give 
the  desired  price  estimates  which  can  be  compared  directly  with  the  actual 
prices,  as  is  shown  in  section  6. 


The  illustrations  so  far  have  dealt  with  relatively  simple  types  of 
curves  which  describe  the  effect  of  one  variable  on  another.  In  the  last 
case  the. curves  to  be  developed,  namely,  the  effect  of  price  on  subsequent 
changes  in  acreage  (case  VI)  are  somewhat  more  complicated,  but  thc-ir  essen- 
tial characteristics  are  easily  revealed. 

CAST  VI 

Effect  of  price  on  acreage  of  cotton 
harvested  in  the  united  States 

. Tnis  example  illustrates  the  application  of  a simple  approach  to 
curvilinear  correlation  in  cases  where  the  dependent  variable  is  expressed 
in  first  differences  or  in  percentage  changes  from  one  year  to  the  next, 
such  treatment  being  the  best  approach  in  analyses  of  acreage  changes.  In 
these  analyses  it  is  usually  found  that  changes  in  one  variable  (acreage) 
from  one  yea.r  to  the  next  respond  to  the  price  received  by  producers  for 
the  preceding  and  second  preceding  crop.  Thus  the  changes  in  cotton  acreage 
from  one  year  to  the  next,  and  not  the  absolute  acreage  can  be  explained  by 
prices,. low  prices  in  one  season  tending  toward  reduction  in  acreage  and 
high  prices  toward  expansion* 

The  method  used  in  analyzing  a case  of  this  sort  is  shown  in  Figure 
6.  Section  3 contains  the  absolute  acreages  of  cotton  and  the  average  price 
received  by  producers,  the  price  used  here  being  adjusted  for  changes  in  the 
general,  level  of  farm  prices  (1910-14  =■  100 ).  In  section  4 are  shown  the 
changes  in  acreage  from  1921  ^to  1928  from  one  year  to  the  next.  These  acreage 
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changes  are  next  shown  in  section  1 , plotted  against  the  price  received 
during  the  year  immediately' preceding.  . Thus,  in  1924  the  reduction  in 
cotton  acreage  of  nearly. 4.2  milliph  acres  is  plotted  against  an  average 
price  received  for  the  1923.  crop  of  21.7  cer  ts* 

A tentative  curve  was'  then  drawn  in  of  a type  which  is  character- 
istic of  the  effect  of  price  on  subsequent  changes  in  acreage  of  soeh 
crops  as  potatoes,  sweet  potatoes*  cotton,  cabbage  and  wheat.  This . type 
of  curve  indicates  that  high  prices  in  any  given  year  result  in  a limited 
expansion  of  acreage,  hut  higher  prices  ab  not  produce  any  greater  ex- 
pansion, Reductions  in  acreage  of  the  above -mentioned  crops  (except  wheat) 
due  to  low  prices  are  not  as  limited  the  first  year  as  are  increases,  and 
lower  prices  bring  still  greater  reductions.  Residuals  from  that  tentative 
curve  were  next  plotted  in  section  2 against  the  price  secured  two  years 
preceding  the  year  of  acreage  changes  and  this  indicated  approximately  a 
linear  relationship.  After  studying  the  first  distribution  of  residuals 
in  section  2,  adjustments  were  made  in  section  I so  as  to  obtain  residuals 
for  section  2 which  would  give  a minimum  of  deviations  from  a curve  in  sec- 
tion 2.  The  prices  two  years  preceding  indicate  a slight  additional  in- 
crease in  acreage  for  very  high  prices,  but  no  further  decreases  for  low 
prices.  As  in  the  preceding  cases  readings  from  curves  I and  II  give  the 
estimates  in  section.  IV— ( 

In  studying  the  relation  of  price  to  changes  in  acreage  of  other 
crops  it  will  be  found  that  the  residuals  from  I (effect  of  price  one  year 
preceding)  need  other  factors  (such  as  prices  three  years  preceding,  prices 
of  competing  crops,  weather,  trends,  etc.),  in  addition  to  the  price  two 
years  preceding  for  their . complete  explanation,  or  reduction  t oar 'minimum. 

But  these  additional  factors  can  be  handled  by  methods  already  illustrated 
in  the  preceding  cases. 

CASE  VII 

Application  of  simplified  methods 
to  a general  problem  in  multiple  curvilinear  correlation 

The  methods  of  graphic  correlation  used  in  the  foregoing  specific 
cases  may  be  summarized  by  applying  them  to  a general  problem  typical 
of  the  cases,  that  are  most  often  encountered  in  actual  practices.  Eor  this 
purpose  we  take  the  data  given  in  Thble  3 for  four  variables  Xp, 

X3>X4,  and  30  sets  of  observations.  The  variations  in  Xp  are  such  that  they 
correlate  perfectly  with  X^  X3,  and  Our  problem  is  to  apply  the 

simplified  method  of  obtaining  directly  by  inspection  the  net  relationships 
between  each  of  the  three  independent  variables  Xot  and  X4,  and  Xp  the 

dependent,  without  the  use  of  the  usual  mathematical  procedure.  When  this 
has  been  done  we  shall  compare  the  results  with  those  obtained  by  the 
mathematical  procedure,  and  with  the  true  relationships  and  shall  iind  tha 
the  simple  approach  gives  in  much  less  time  and  labor  practically  the  same 
net  curves. 


8/  For  0.  more  detailed  analysis  of  changes  in  cotton  acreage  see  Factors 
affecting  cotton  prices,  U.  S.  • Department  of  Agriculture  Bulletin  Ho#  50, 
by  Bradford  B.  Smith. 
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The  entire  process,  is  contained  in  the  two.,  accompanying  Figures.,  7 
and  8,  except  .the  element  of  simpl.e  judgment  which  is.  .required  in  studying 
or  inspecting  .the  independent  variables  before,  drawing  the  first  approxima- 
tions of  the  .net  regressions  and  no  mathematical  computations  are  involved 
other  than  the  simple,  one  of  reading  or  measuring  distances  from  curves  and 
plotting  such  deviations  in  subseauent  scatter  diagrams. 

In  the  following  pages  the  steps  of  the  simple  procedure  will  be 
restated  in. the.  terms  of  the  present  problem. of  four  variables.  All  the 
details  will  be  given  so  that  generalizations  will,  be  unnecessary. 

We  may,  however,  note  again  that  instead  of  establishing  tentative  net 
linear  regressions,  by  mathematical  correlation  we  shall  make  use  of  scatter 
diagrams  only  and  by  inspection  determine,  directly  a tentative,  but  very 
close,  approximation  to  the  true  net  curvilinear  regressions  which  will  re- 
quire only  minor  changes  in  the  form  of  final  approximations  also  to  be 
made  graphically. 

The  only  computations  involved  are  those  required  to  obtain  the 
index  of  correlation,  but  this  is  relatively  simple,  calling  only  for  the 
sum  of  readings  from  the  final  curves  (the  usual  X1)  subtracting  them  from 
t.he  actual  values  (Xq)l  computing  standard  deviations  for  the  actual  values 

and  ior  tne  final'  residuals  (X^-  Xq)  and  substituting  these  in  the 
formula  for  the  index  of  correlation  (p).  , 

The  steps  now  to  be  indicated  in  detail  are: 

1*  Plotting  three  scatter  diagrams,  Xq  with  Xg,  Xq  with  X3 

and  Xq  with  X^  to  determine  by  inspection  if  possible  which 
of  the  three  independent  variables  is  the  most  important  in 
the  variations  in  Xq.  • ■ . 

2,  Determining  by  inspection  a first  approximation  to  the  net 

relation  between  Xq  and  X?. 

3.  Determining  by  inspection  which  of  the  remaining  two  variables 

X3  or  ^4  is  the  more  important  in  the  Xq  variations  not 
accounted  for  by  X2  and  plotting  against  it  (X4)  the  residual 
variations  from  Xq  X2. 

4,  Determining  by  inspection  the  first  approximation  to  the  net 

relation  between  X^  and  the  residuals  from  Xq  X^, 

5.  Plotting  the  residuals  from  the  curve  established  in  4 against 

-A&  and  determining  the  relation  of  X3  to  these  final  residual 
values  of  Xq, 

o.  Plotting  the  residuals  from  the  curve  established  in  5 as 
deviations  from  the  other  two  first  approximation  curves 
and  making  second  approximations,  whore  necessary  to  reduce 
the  residuals  still  further. 

By  plotting  in  the  form  of  scatter  diagrams  Xqand  X2»  X-j  and  X3, 

Xq  and  X4  it  became  evident. that  the  correlation  between  Xq  and  X2  is 
greater  than  between  Xq  and  either  X3  or  X4.  We  therefore  select  the  Xq  X2 
scatter  diagram  and  proceed  to  find  the  nature  of  the  relation  between  Xq 
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and  Xp,  That  diagram  is  shown  in  Figure  7.  (As  the  other  two  scatter  dia- 
grams are  not  necessary  hereafter  they  are  not  presented  here).  In  Figure 
9 we,'  have  also  plotted  the  variables  X3  and'Xq,  having  numbered  them  consecu- 
tively from  1 to  30,  which  we  shall  need  to  ’’inspect1’  as,  we'  proceed  to  . 
determine  the  several  net  regressions.  It  should  be  observed  that  it  is 
essential  in  our  procedure  that  the  identity  of  the'  individual  observa,- 
tions  be  maintained. 

In  studying  the  scatter  diagram  Xq  Xp,  we  need  to  answer  the  question, 
Is  the  relation  between  Xq  and  XQ  (with.  Xg  and  X4.  constant) , positive  as 
indicated  in  the  diagram  or  negative?  Is  it  linear  or  curvilinear?  To, 
answer  these  questions,  wo  make  use  of  the  fact  that  if  the  relation  be- 
tween X-,  and  Xg  and  Xq  and  Xq  could  be  held  constant  simultaneously  for 
two  or  more  observations,  the  comparable,  observe! ions  in  Xq  Xp  would  lie 
along  a line  either  linear  or  curvilinear  which  would  indicate  the  true  re>- 
gression  for  Xq-  Xp  for  those  two  or  more  observations  only.  Now  Xg.  and  Xq 
in  any  two  or  more  observations  would  bear  a constant  relation  to  Xq  under 
either  of  these  two  conditions,  (l),  if  the  Xg  values  and  the  Xq  values 
were  all  equal  or  (2)-  if  the  Xg  values  were  equal  and  the  Xq  values  were 
equal.  We  therefore  proceed  to  inspect  the  actual  values  of  Xg  and  Xq  for 
such  combinations  (see  Figure  9)  and  note  first  that  the  observations 
numbered  S,  7,  10,  23  and  26  show  equal  values  for1  both  Xg  and  Xq. 

We  next  find  the  comparable  observations,  6,  7,  10,  23,  and  26  in  the 
Xi  Xp  scatter  diagram  and  note  that  they  appear  to  lie  along  a straight 
line,  which  is  tentatively  drawn  in.  Further  inspection  of  X3  and  Xq 
reveals  that  in  observations  1 and  29  and  2,  8,  14,  they  have  approximately 
the  same  values.  We  find  and  connect  the  comparable  observations  in  Xq  Xp. 
We  also  note  that  in' observations  5 and  28,  X3  has  low  but  nearly  equal 
values  and  Xq  has'  high  but  nearly  equal  values.  As  before  we  find  and 
connect  the  6th  and  28th  observations  in  Xq  and  Xp  . From  the  fact  that 
the  several  lines  so  drawn  and  distributed' through  the  diagram  are  nearly 
parallel,  it  is  evident  that  the  true  regression  of  X^  X^  is  a straight 
line  of  the  slope  indicated  by  the  parallel  lines  and^we  proceed  to  draw  a 
first  approximation  of  that  regression  line  through  the  body  of  the  scatter 
diagram.  (Had  the  true  regression  been  curvilinear,  a line  connecting  more 
than  two  observations  for  constant  values  of  Xg  and  Xq  would  have’ revealed 
it.)  This  first  approximation  may  now  be  taken  as  the  tentative  measure 
of  the  relation  between  Xp  and  Xq  to  be  modified  later  if  necessary  and 
the  vertical  deviations  from  this  regression  may  be  assumed  to  be  related 
to  X3  and  Xq. 

Our  next  step  involves  measuring  or-  reading  the  differences  between 
Xq  and  the  Xq  Xp  tentative  regression,  and  plotting  them  against  either 
Xq  or  Xg.  it  is  immaterial  which  of  these  independent  factors  are  used 
first,  but  for  convenience  we  may  choose  the  one  which  appears  to  have  the 
greatest  influence  on  the  Xl  residuals.  Note  that  the  greatest  negative 
deviations  from  the  Xq  Xp  regressions  such  as  numbers  3 and  29  are 
associated  with  very  small  values  of  Xq  for  those  observations , and  the 
greatest  positive  deviations  5,  20,  and  28  are  associated  with  very  large 
values  of  Xq.  These  facts  suggest  that  Xq.  may  be  the  dominant  factor  in 
determining  the  positive  and  negative  residuals.  They  also  suggest  that 
tho  relation  to  be  expected  between  Xq,  and  the  residual  values  of  Xq  is  of 
a positive  character.  Incidentally. this  method  of  inspection  also  throws 
some  light  on  the  nature  of  the  relation  of  X3  on  the  residual  values  of  Xq. 
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For  example,  we  note  that  number  18  among  the  observations  in  X^  X^ 

is  well  above  the  Xq  Xg  regression  hut  instead  of  being  associated  with 
a high  value  for  X4  as  in- the  other  instances  of  large  positive  residuals 
it  is  associated  with  a low  value  of.  X3*  .his  suggests  that  the  relation 
between  X3  and  Xq  may  be  of  a negative  sort  (at  least  for  low  values  of 

X3).  . - . • ••  .. 

By  plotting  the  vertical  deviations  from  X^.  X^.  against  X^,  as  xs 
done  in  Figure  8,  Section  2,  we  obtain  a scatter  diagram  in  which  we  desire 
to  discover  the  nature  of  the  relation  of  X4  to  the  residual  values  of  Xq 
(that  is,  to  X^  from  which  the  effects  of  X^  have  already  been  removed). 
Inasmuch  as  these  residual  values  in  Section  2 are  related  to  X4  and  X3> 
we  may  proceed  to  find  the  relation  of  Xq  to  the  Xq  residuals  by  selecting 
those  observations  in  which  X3  values  are  equal.  In  the  observations 
numbered  1,  9,  15,  24,  28,  29,  the  X3  values  are  equal*  Connecting  the 
comparable  observations  in  Section  2 we  obtain  a' .curve  of  a positive  slope, 
which  appears  to  fit  the  scatter  very  well. 

The  adequacy  of  this  curve  may  now  be  checked  by  selecting  constant 
values  of  X3  for  large  values  of  X4  and  also  for  low  values.  Consequently 
we  note  that  in  observations  2 and  20,  X3  has  equal  values.  Connecting 
the  two  corresponding  points  in  Section  2,  we  obtain  a portion  of  a curve 
which  is  approximately  parallel  to  the  curve  (for  the  high  values  of  X4) 
already  drawn  in  through  observations  1,  9,  15,  24,  28,  29.  Similarly 
the  X3  values  in  19  and  11  are  practically  equal,  and  a line  connecting 
the  comparable  points  in  II  are  approximately  parallel  to  the  first  car\e 
(for  the  low  values  of  Xq  ).  If  now  we  take  the  first  curve  (drawn  through 

1,  9,  29)  as  the  first  approximation  of  the  net  relation  of  X4  to  the 

Xp  residuals,  we  note  that  many  of  the  observations  in  Section  2 do  not 
lie  on  that  curve,  presumably  because  of  the  influence  of  X3.  Tne  influence 
of  X3  may  now  be  observed  by  plotting  the  difference  between  the  observa- 
tions and  the  tentative  curve  in  Sect  ion  2 against  the  comparable  values 
for  X3*  This  step  is  shown  in  Section  3.  The  nature  of  the  relation  of 
X3  to  the  residual  values  of  Xq  is  immediately  evident.  Instead  of  the 
positive  gross  relationship  indicated  by  the  scatter  diagram  of  Xq  X3 
made  at  the  beginning  of  the  analysis,  we  now  find  a negative  net  re- 
gression, particularly  pronounced  for  low  values  of  X3. 


It  is- evident  from  the  relatively  narrow  scatter  of  the  observa- 
tions in  Section  3 about  the  first  approximation  curve  drawn  through  them 
that  by  means  of  the  three  net  regression  curves  developed  so  far  we  have 
accounted  for  nearly  all  of  the  variations  in  Xq*  It  remains  now  to  see 
if  some  slight  adjustments  in  these  curves  will  reduce  the  scatter  of  the 
final  residuals  about  the  X3  curve  in  Section  3 still  more.  At  this, 
point,  if  desired,  the  standard  deviations  from  the  X3  curve  in  section 
3 and  the  standard  deviations  of  the  original  values  of  Xq  may  be  computed 
to  determine  the  extent  to  which  the  three  net  curves  account  for  the 
variations  in  Xq.  Substituting  these  standard  deviations  in  the  foxuola. 
for  P,  an  index  of  correlation  of  .997  is  indicated,  the  standard  deviation 
of  the  final  residuals  being  .46. 
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We  may  now  complete  the -analysis, /by  making  a.,  final  test  of  the 
adequacy,  of : the  firsf, approximation  ne.t  regression  curves.  Here  we 
follow  the  usual  procedure  of  plotting  the -final  residuals  (as  shown  by 
the  scatter  around  the  curve  ’ in  section  3)  'as  deviations  from  the  curve 
in  section  1.  This  step  is  shown  in  Figure  8 to  which’  have  been  trans- 
ferred the  first  approximations  from  Figure  7.  The  scatter  of  the 
residuals  about  the  Xq  Xg  curve  , indicates  that  ,rio  material  adjustment  in 
the  shape  or  slope  of  that  curve"  is  necessary*.,  The  residuals  are  next 
plotted  as  deviations  from  the  first  approximation  curve 'in  Section  2. 

Here  the  scatter  about  the  Xg- Curve  (in  section  2),  does  indicate  that  a 
slight  raising' of  the  first  approximation  curve'  for  the  higher  values  of 
X^  as  well  as  for  the  very  low  ones,  would  reduce  some  of  the  residuals 
still  more.  This  adjustment,  drawn  in  by  inspect ioh^our -second' approxima- 
tion for  the  Xq.  curve  is  shown  by  the  solid  line  i:n  section  2.  Had  the 
scatter  been  wider -*aboUt  this  curve  ft  probably 'Would  have  been  desirable 
to  follow  the  usual  procedure  of  averaging  or  grouping,  the  deviations 
according  to  the  values  of  X4  in  order  to  determine  more  exactly  the  shape 
of  the  second  approximation  curve.  The  scatter  about  this  second  approxi- 
mation X4  curve  now  indicates  how  much  of  the  variations  in  Xq  can  be 
accounted  for  by  the  three  curves, (first  approximation  of  Xq  Xg,  second 
approximation  Xq  X4  and,  first  approximation  of  Xq  X3) 

The  reduced  residuals,  that  is,  the,  deviations  about  the  second  X4 
curve  are  next  plotted  as  deviations  about  the  first  approximation  X3 
curve  in  Section  3 (Figure  8),  to  test  the  adequacy  of  that  curve.  This 
scatter  indicates  the  desirability  of  lowering  somewhat  the  first  approxima- 
tion X3  curVe.  This  adjusted  curve  now  bee ome's  the  second  approximat ion 
X3  curve. 

The  extent  to  which  these  two  adjustments  have  reduced  the  first 
set  of  residuals  may  now  be  seen  either  in  the  extent  of  the  deviations 
about  the  X3  second  approximation  X3  curve  or  by  computing  values  for  Xq 
from  the  three  final  net  regressions.  The  readings  from  these  curves,  and 
the  differences  between  them  and  the  actual1  values  of  Xq  are  given  in 
table  3,  The  standard  deviation  of  these  differences,  or  final  residuals 
is  .411,  with  ,46  obtained  from  the  first’  and  the  index  of  correlation 
is  .998  (compared  with  .46  and  .997,  respectively  for  the  readings  from 
the  first  approximations). 

Comparison  between  the  approximation  curves  and  the  true  curves. 

In  order  to  determine  whether  the  results  obtained  by  the  simplified 
approach  to  curvilinear  correlation  are  reasonably  accurate,  we  may  compare 
them  with  the  true  curves  and  with  the-  approximations  that  are  obtainable 
by  the  usual  method  which  involves  the  mathematical  determination  of  linear 
net  regressions  and  the  reduction  of  residuals  to  a minimum  by  successive 
approximations,  as  described  by  Ezekiel..?./  . 

To  facilitate,  this  comparison  we  used' in  this  general  illustration 
the  data  given  by  Ezekiel  in  his  ^Method  of  Handling^  Curvilinear  Correlar- 
tion”  l/  which  are  described  by  the  formula  Xi  a Xo  f 22.  4-  2 '/Xq  - 5. 

The  true  net  curves  for  Xq  Xg,  Xq  X3  and  X-q  X4,  derived  f?om^this  formula  _ 

9 J See  Journal  American  Statistical  Association,  December  1924. 
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VALUES  OF  X3ANDX4. 
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FIGURE  9 


are  indicated  in  Figure  9 and  are  compared  with  the  curves  obtained  by  the 
simplified  method  of  correlation*  ; The -approximation  curve  in  section  1 has 
only  a slightly  different,  slop-e  from  that  of- the  true  curve.  The  approxi- 
mation curve  in --section  2 does  not  rise  as  much  for  values  of  between 
0 and  5,  as  does  the  true  "curve,  but  this  is  :due  to  the  fact  that  the 
data  used  in  this  problem  had  no  values  for  between  0 and  5#  The 
approximation  in  section-  3 also  differs  only  slightly  from  the  true,  curve. 

These  close  agreements  between  the  true  curves  and  the  approxima- 
tions may  be  compared  with  the  results  obtained  by  the  Ezekiel  Method,- 
by  referring  to  page  446  of  the  December,  1924  Journal  of  the  American 
Statistical  Association.  It  will  be  observed  that  the  approximation 
curves  there  derived  show  in  general  practically  the  same  agreement  with 
the  true  curves.  For  the  curves  Xq  and  X^  Xq  the  agreement  is  somewhat 
closer  as  developed  here  (in  Figure  9), 

A correlation  index  of  .994  was  there  obtained  after  three  success- 
ive approximations  which  may  be  compared  with  the  correlation  index  of 
•998  after  only  one  adjustment  as  indicated  above* 

The  results  obtained  by  this  simple  approach  in  the  first  six 
illustrations  are  also  practically  the  same  as  those  obtainable  through 
the  usual  procedure.  The  data  in  Case  II,  for  exanple,  were  correlated 
by  the  usual  method,  which  required'  a series  of  four  approximations  to 
obtain  a final  correlation  of  .993.  The  simple  approach  gave  in  much 
less  time  the  same  correlation, . 995,  and  practically  the  same  net  relation- 
snips  between  price  and  consumption,  between  business  activity  and  consump- 
tion, and  the  trend  in  residuals  as  were  obtained  by  the  mathematical  corre- 
lation. ' v • 


Both  methods  of  curvilinear  correlation  depend  to  some  extent  on 
judgment.  In  the  usual  me t ho d,  judgment  comes  into  play  in  converting 
linear  regressions  into  curvilinear  ones..  In  the  simple  approach^  judg- 
ment is  brought  into-  play  in  the  process  of  determining  first  approxima- 
tions to  net  curvilinear  regressions  directly  by  inspection.  In  both 
cases  there  is  some  freedom  in  shaping  the  curves.  Where  two  independent 
variables  are  highly  correlated,  and  one  of  the  net  regressions  is  given 
an  inordinate  slope1,  it  will  be  compensated  by  a corresponding  change  in 
tne  other  curve.  For  example,  had  we  drawn  in’ Figure  III,  section  5, 
a curve  instead  of  a straight  line,  "there  would  have  appeared  a compen- 
sating difference  in  the  slope  of  the  trend  line  in  section  6 without 
any  effect  on  the  final  correlation.  As  a matter  of  fact,  in  this  par-^ 
ticular  instance,  the  sinple'r  approach  reveals  immediately  that  the  assump- 
tion that  the  true  relationship  is  a straight  line  will  agree  with  the  ob- 
servations, while  the  usual  approach  without  a similar  process  of  inspec- 
tion would  lead  one  to  assume  an  illogical  curvilinear  function. 

The  question  may  be  raised  whether  the  simple  approach  can  be  con- 
veniently used  in  problems  involving  more  observations  and  more  variables 
than  those  in  the  illustration.  • - . * 

Inasmuch  as  the  facility  of  this  method  depends  on  detecting 
approximate  net  regressions  by  inspection  instead  of  by  mathematical 
computation  of  linear  regression  and  successive  approximation,  too  many 
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observations  in  a scat  ter' 4iagnaiti"a'i‘e  likely  -to  inake'-'it  difficult  to  find 
the-  true" relationship' ;>  * But  this'  limitation bean ■■'be  overcome  by  splitting 
the  problem  into  • twe^dr'-  more  sections  and  treating  each  'separately, ' 
Something  of  this  sdrtmfcras  undue  at  ed  ;in  example  I V-  where  the  average  , net 
effect  of  supply  on  the  price'  q£  apples-, was  determined  by  establishing 
tentative  curves:  for.,  the  pre-war  and  p.os  t-war  - years . Such  treatment  of 
a long  time  series  of  observations  is  in  fact  likely  to  give  much  truer 
and  more  reliable  r el  at  ionshipj.-pairtiQularly  if -the  same  or  similar  curves 
are  found  to  hold  good  in  each  of  two  or  more  periods. 

For  problems. -of  30,-  observations-;  and;  variables  such  as  that  treated 
by  Ezekiel  in  illustrating  hi  & method -.of  curvilinear  correlation!/  the 
simple  approach  -is,,  a-s.  we.  have.'-  seen,: : very  satis  factory*  -.  That,  set  of  data 
when  treated  by  methods  described  here  gave  the  same  net  curves  and  the 
same  high  correlation,  but  the  time  involved  was  only  about  one-fourth  as 
much.  , ... , . . . ....  ..  . ...  ...  ,- 


Another  question  concerns.,  the- significance  or.  reliability  of  the 
results  obtained  by  this  method  for,  such  short  time  analyses  as  have  been 
presented  here  particularly.  in,  ..Cases  I ..to  VI.  -We. -have  two  tests  that  may 

be  applied.  . • ...  . . . / ......  , ,.v:. ,,  ...  , ..  . 


One  is  to  repeat  the  analyses  for  another. period  or  for  a longer 
one  to  determine .whether  similar  results  will  be: obtained.  .This,  howeve 


"assumes  that  economic  relationships,  remain  unchanged  from  one  period  to 
the  next,  which  is  not  necessarily  so.  -When  an  analysis  for  an  earlier 


r, 


period  corroborates,  the  result  ,/of  a more  recent  ono,  it  lends  greater 
confidence  in  the  latter;  but  if  it  does  not  agree,  it  does  not  invali- 
date the  latter,  for  different  sets  of  forces  may  be  at  work  in  one  period 
than  in  another..  .Thus,  in  the  acreage,  analysis  , referred  to.  on  page  10 
the  world  war  and.  the  boll  weevil  . were  factors  in  the.  earlier  part-  of  the 
period,  but  not  in  .the  more  recent  years.,  ..However,  the  net  effects  of 


price  . on  acreage  in  case  VI  .are  of- the  .same  type  as  those  derived  by  the 
more  detailed  study , based  .o.n  the  .formal  approach.  ,.  . 


A second  test  is  the  practical  one  of,  applying  the  results  obtained 
for  a short  period  to  the  year  or  years  immediately  preceding  or  follow- 
ing. In  each  of.  the  cases  presented  and  a number  .of  others  that  .might 
have  been,  presented,  very  satisfactory  results-  were  obtained  when  the-  re- 
lationships established  for.  the  period  ending  with  1927  were .applied  to 
1928.  This,,  however,  is  like  the  preceding  test  in  that,  if,  the  relation- 
ships established  for  a given  period  hold  ’also  for  a year  outside  that 
period,  we  may  have.. greater  confidence  in  the  established  ..relationships, 
but  they  are  not  necessarily  invalidated  if  they  do  not  apply  with  equal 
accuracy  to  outside  years. 


Finally,  for  those  who  are  accustomed  to  thinking  of  goodness  of 
fit  and  reliability  of  results  in  terms  of  correlation  coefficients,  corre- 
lation indexes,  and  standard  errors,  it. may  be  of  interest  to  point  out 
how  nearly  we  have  accounted  for  all  the  variations  in  the  dependent 
variables  in, each  of  the,  foregoing  problems. 

For  this  purpose  wo  may  make,  use  of  the  correction,  that  mast  be 
applied  to  correlation  coefficients,  taking  into  account  the  number  of 


variables  or  constants  determined  in' ‘the  regression  equation,  and  the  num- 
ber of  variables  the  need  and  significance  of.  which  has  already  been  des- 
cribed m the  Journal  10/  of  the  American.  Statistical  Association.  The 
correction  indicated  to  what  extent  the  observed  correlation  coefficients 
ln  a sample* may  overstate  the  true  correlat  on  existing  between  the  same  • 
variables  in  the  universe1  from  which  they  were  selected.  When  applied  to 
a correlation  in  time  series,  correction  for  the  number  of  variables  and 
o servations  does  not  have  the  usual  significance,  for  a time ‘series  is  not 
a sample.  .However,  even  though  correlation  coefficients  in  time  series 
ack  the  significance  that  they  have  in  problems  where  different  samples 
may  be  drawn,  some  readers  may  be  interested  in  the  correlation  indexes 
e ermined  for  the  six  foregoing  cases  as  well  as  the  seventh. 

: > 

In  the  following  tabulation  are  given  first  the  original  multiple 
correlation  indexes  and  the  standard  deviations  of  the  residuals,  and  in 
-xe  last  two  columns,  the  indexes  and  standard  errors,  'after  correcting 
or  e number  of  observations  and  the  maximum  number  o*f  variables  and 
cons  ants  that  may  be  assumed  to  be  represented  in  the  net  curves. .il/ 

In  these  cases  of  very  high  correlations  the  corrections''  for  the  number , of 
varia  les  and  constants  are  not  material,  and,  in  so  far  as  this  criterion 
is  concerned,  the  corrections  do  not  impair  the  validity  of  our  results. 


Case 


I 

II....... 

III....... 

IV •••••.. 

V..14... 

vi 

^ii*t.«»», 

1 7 Based  < 


p 

0 z 

Number, of 
variables 
and 

constants 

: Number  of 
: obser- 

: vat ions 

P : 

Se 

.997 

2.67 

5 

8 

992 

3.38 

.995 

1.0 

4 

10 

991 

1.3 

.995 

1.0 

5 

10 

989 

1.4 

.998 

2.37 

5 

18 

997 

2.79 

.997 

.07 

7 

8 

978 

.20 

.986 

.69 

5 

.9 

958 

1.04 

.998 

.41 

6 

30 

997 

* 46 

one  estimates  in  terms  of  prices  adjusted  for  changes 

in  the 

Proceedings  of  the  American  Statistical  Association,  December,  1928. 
aper  y.M.  J.  3.  Ezekiel,  on  Application  of  Theory  of  Error  to  Multiple 
and  curvilinear  correlations. 

l_l/  The  corrections  are  made  by  the  use  of  these  two  formulae: 

Corrected  p - p"  = 1 - 1 ~ and 

1 - m 

n 

Corrected  0~z  - Se  r n Q~z 

n-m 

The  formula  for  P is  tljat  developed  by  B.  3.  Smith  and  applied  to 
curvilinear  correlation  by  M.  J.  B.  Ezekiel.  The  fornrola  for  Se  is  R.  A. 

is  er  s formula  (Statistical  Methods  for  Research  Workers,  p.  135)  restated 
by  M.  J.  B.  Ezekiel. 
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Table  1 Data  used  in  cas@s  I - IV 


C as  e I 

:•  Cases  II 

and.'  Ill ' 

’ .7 

Price  per  bush. 

: May  price  per 

* Index  of 

: Price  : 

Index  of 

Year 

Production 

received  by 

:cwt.;  of  old 

icott on’ con- 

:per  lb. : 

production 

1/ 

producers  ' 

: potatoes 

.•sumption 

: of  : 

of  mfrs. 

: Chicajsp 

:■  2/ 

: cotton  : 

Million 

' < - 

• , f 

• • 

“bushels 

Dollars 

: Dollars 

: Per -cent 

:•  Cents'  : 

Per  cent 

1919 

— 

— . -r- 

; 4 . ,• 

•'  ’ 96' 

•22.0  ■ 

- 84 

1920 

-- 

-- 

- — ' 

■ ■ ' • 95 

24,7 

86 

1921 

21.2 

1.12 

.87  " 

88 

14.5 

1922 

24.1 

1.34 

■1. 70 

99 

18.2 

87 

1923 

~ 18.7 

1.67 

1.13 

106 

25.3 

101 

1924 

29.4 

.99 

-1.50 

89 

30.6 

94 

1925 

20.4 

1.41 

1.13 

105 

24.6 

105 

1926 

23.7 

1.72 

3.23 

109 

■19.7 

108 

1927 

29.6 

1.55 

■»  3.51 

122 

15.3 

106 

1928 

37.4 

.65  ' 

1 .43 

107 

20.4 

111 

1 / Potato  production  10  early  states, 

2j  1923-25  - 100,  Federal  Reserve  Board, 

3 J At  Hew  Orleans,  crop  year  ending  in  the  indicated  calendar  year. 


Year 

Case  IV 

Price  per  bush.  : 

Total  pro- 

: Index  of 

• 

• 

Ratio  of  actual  prices 

received  by  pro-: 

duct ion  of 

: food  prices 

• 

• 

to  prices  read' from 

ducers  4 / : 

apples 

: 5/ 

• 

• 

curves  I and  II 

Million 

• 

• 

• 

• 

Dollars  : 

bushels 

: Per  cent 

• 

• 

per. cent 

1910 

102.6 

142 

61  a 8 

1 

61.8 

1911 

91.1 

214 

65.7 

71.7 

1912 

74.8 

235 

65*  (? 

63.9 

1913 

106.1 

145 

63  a 9 

64.7 

1914 

71.7 

253 

56,5 

65,8 

1915  ■ 

79.4 

230 

67 «. 3 

. 69,6 

1916 

104.2 

194 

89.3 

89,8 

191.7 

125.9 

167 

112o  6 

112,4 

1918 

154.6 

170 

136,0 

125,7. 

1919 

208.9 

142 

137,  5 

135.6  . 

1920 

144.2 

224 

111.1 

HO*  9 

1921 

197.4 

99 

86.8 

87*3 

1922 

130.4 

203 

91 . 6 

91.2 

1923 

125.5 

203 

90.7 

91 1 2 

1924 

138.7 

172 

95.8 

96.3 

1925 

137.4 

172 

101  ..,5 

100.3 

1926 

99.1 

247 

97.4 

102,1 

1927 

156.4 

123 

98.6 

, 97,1  • 

4/  Straight- average  July-May  ..  5/ 

.1926  '1.0(3  Bureau 

Labor  Statistics  average 

- ~ for  July- June  . 


f 


) 


Table  2 - Data  used  in  Cases  V and  VI 


Case  V 

Case  VI 

Changes  .in 

.•Price  per  'lib. 

N.  Y.  price  per: 

Price  in 

U. 3. pro- 

Index  of 

Year 

cotton 

: received  by 

box  of 

terms  of 

due  taon 

production 

acreage. 

producers  l/... 

Calif,  oranges: 

. 1926  • 

of 

of  com- 

. . • 

‘ Nov.  -.  Oct.  : 

« 

♦ 

Dollars 

2/ 

0 ranges 

peting 
frui  ts3,/ 

Million 

acres 

: Cents 

Dollars 

Dollars 

Million 

boxes 

Per  Cent 

1918 

„ 

; *14.2 

1 

... 

__ 

1919 

— 

•16.0 

' . 1 -- 

— 

— 

— 

1920 

b 2*3 

10.4  / 

5. 75 

5.18 

29.9 

94.1 

1921 

- 5.4  ' . 

14.3 

7.19 

8.28 

20.1  - 

83.5 

1922 

f 2-4 

17.5 

5.29 

5.78 

29.9  • 

107.9 

1923 

f 4.1 

23.  .7 

5.39 

5.94 

34.2 

106.7 

1924 

*-4.2 

16.1 

7.01 

7.32 

28.0 

97.5 

1 925 

► 4.7 

13.7 

6.15 

6.06 

31.0 

113.3 

1926 

f 1.0 

9.7 

5.58 

5.73 

35.7  •• 

125.4 

1927 

1928 

- 6.9 
+ 4.7 

14.6 

6.60 

• 

6.70 

33.5 

103.2 

1/  Weighted  average  farm  price  August -July  divided  by  July-June  index  of 
farm  prices  1910-14  = IOC'* 

2/  New  York  price  divided  by  index  of  . food  prices,  see  column  3 under  case 
•IV.  '' 

3/  1919-1927  = iooy  ‘ V * ; ..  . ..  o _ -;7, 

index  includes  production  of  apples  of  one  year  and  of  reaches, 
pears,  Strawberries  and  grapes. of  the  next,  weighted  by  average 
prices  received  during.  1919-1927. 
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Table  3 - Data  used  in  Case  VII 


Item 
number  . 

■Raw 

data 

. 

: ; Readings' 

from  'curves 

Item 

number 

X2  ; 

*3 

j x4 

’•  V 

*•  r I : : ' 

• II 

' • 

III  : 

Sum  : e 

(Tx)  : 

1 

ii 

10- 

9 

14 

16.2  - 

2.5 

.3 

13 . 4 4 1 0*6 

• 1 

2 

20 

19 

15 

24 

24.9  4 

0.2 

— 

1.3 

23.8  4 0.2 

2 

3 

6 * 

6 . 

0* 

4 

11.3  - 

8.1 

4 

1.2 

4.4  - 0.4 

3 

4 

6 • 

12 

6 

8 

11.3  - 

2.8 

0.6 

7.9  4 0.1 

4 

5 

8- 

8* 

26 

16 

13.3  4 

2.5 

4- 

'0.2 

16.0  0 

5 

6 

9* 

8 * 

8 

12 

14.2  - 

2.1 

-f 

0.2 

12.3  - 0.3 

6 

7 

11 

8. 

8' 

13 

16.1  - 

2.1 

4 

0.2 

14.2  - 1.2 

7 

8 

14 

16' 

16 

18 

19.0  4- 

0.8 

1.1 

18.7  - 0.7 

8 

9 

12 

10 

. o 

9 

17.0  - 

8.1 

— 

0.3 

8.64O.4 

9 

10 

8 . 

8 * 

8 

11 

13,3  ~ 

2.1 

4 

0.1 

11.3  - 0.3 

10 

11 

4. 

5. 

10 

11- 

9.4  - 

1.3 

4 

2.4 

10.5  40.5 

11 

12 

23 

26 

26  ’ 

28 

27.7  4 

2.5 

1.6 

28.6  - 0.6 

. 12 

13 

14 

12- 

10 

17 

19.0  - 

1.3 

— 

0.6 

17.1  - Q.l 

13 

14 

10 

16  * 

14 

14 

15.2 

0 

- 

1.1 

14.1  - 0.1 

14 

15 

10 

10- 

15 

15 

15.2  4 

.2 

0.4 

15.0  0 

15 

16 

20 

13- 

20 

26 

24.9-/- 

1.3 

— 

0.7 

25.5  4 0.5 

16 

17 

12 

12* 

12 

16 

17.0  - 

.7 

— ■ 

0.6 

15.7  4 0.3 

17 

18 

10 

2 • 

8 

21 

15.2  - 

2.0 

4 

7.4 

20.6  -f  0.4 

18 

19 

16 

6 * 

5 • 

19 

21.0  - 

3.2 

4 

1.2 

19.0  0 

19 

20 

20 

20 

30 

27 

24.9  4- 

3.1 

1.4 

26.6  4 0.4 

20 

21 

10- 

10* 

10 

13 

15.2  - 

1.3 

— 

0.3 

13.6  - 0.6 

21 

22 

2. 

8 

6 

5 

7.5  - 

2.8 

4 

0.1 

4.8  4 0.2 

22 

23 

8 - 

8 * 

8 

11 

13.3  - 

2.1 

4 

0.1 

11.3  - 0.3 

23 

24 

12 

10* 

11 

16 

17.0  - 

*9 

0.4 

15.7  4 0.3 

24 

25 

13 

7* 

12 

18 

18.0  - 

.7 

4 

0.6 

17.9  4 0.1 

25 

26 

15 

9' 

7 

17 

20.0  ~ 

2.5 

— ' ■ 

0.1 

17.4  - 0.4 

26 

27 

24 

28 

18 

28 

28.6  4- 

1.1 

— 

1.8 

27.94-0.1 

27 

28 

10  . 

10- 

30 

18 

15.2  4. 

3.1 

— 

0.3 

18.0  0 

28 

29 

4 - 

10  - 

9 

8 

9.4  - 

1.7 

- 

0.3 

7.4  4 0.6 

29 

30 

8 • 

6 . 

10 

13 

13.3  - 

1.3 

4 

1.2 

13.2  - 0.2 

30 

0 6.28 


* **  ***#  sje 
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A SIMPLIFIED  METHOD  0?  GRAPHIC  CURVILINEAR  CORRELATION 
APPLIED  TO  CHANGES-  IN  ACREAGES  , YIELDS  , AND 
LIVESTOCK  NUMBERS 

By  L.  H-  Bean,  Senior  Agricultural  Economist,  Division  of 
Statistical  and  Historical  Research,  Bureau  of  Agricultural  Economics 

The  simplified  method  of  correlation  already  described  in  considerable 
detail  elsewhere  l/  may  be  further  illustrated  by  applying  it  to  three 
additional  problems  dealing  with  actual  changes  in  acreage,  livestock  numbers 
and  yields.  The  cases  selected  for  illustration  deal  with  changes  (a)  in  the 
United  States  acreage  of  cabbage,  (b)  in  the.  total  number  of  hogs  on  farms  and 
(c)  in  the  yield  per  acre  of  wheat  in  an  eastern  State.  It  will  be  noted  that 
the  first  two  of  the  present  illustrations  (VIII  and  IX)  are  similar  to  Case 
VI  already  described,  in  that  the  dependent  variables  (acreage  and  hog  numbers) 
are  expressed  as  first  differences  or  absolute  increases  or  decreases  from  the 
preceding  yearfs  figure.  The  final  illustration  (x)  is  like  the  general  pro- 
blem described  under  Case  VII. 

Case  VIII.  The  Relation  of  Price  to  Changes  in  the 
United  States  Acreage  of  Cabbage 

In  this  problem  it  is  desired  to  correlate  the  price  of  cabbage  re- 
ceived by  producers  with  subsequent  changes  in  acreage  and  to  develop  two 
curves , one  representing  the  relation  of  the  price  received  for  the  crop  in 
the  first  year  preceding  the  acreage  change  and  another,  the  relation  of  the 
price  received  two  years  earlier.  The  prices  used  here  have  been  adjusted 
for  changes  in  the  general  level  of  farm  prices. 

Our  first  step  is  to  plot  in  a scatter  diagram  the  price  one  year  pre- 
ceding against  changes  in  acreage  (See  section  1,  Pigure  10).  The  second  step 
is  to  obtain  for  that  scatter  diagram  an  approximation  to  the  relation  of 
price  one  year  preceding  to  acreage  changes,  exclusive  of  the  influence  of 
the  price  two  years  preceding.  As  an  aid  in  making  that  approximation,  we 
examine  the  variations  in  the  price  Wo  years  preceding  for  equal  or  approxi- 
mately equal  values  of  that  variable  and  note  (in  Section  3-Figure  10)  that 
the  prices  in  1920,  1924  and  1525  are  approximately  the  same.  Now  if  the 
price  two  years  preceding  has  any  influence  on  acreage,  then  these  three 
similar  prices  should  have  about  the  same  influences  on  the  1922,  1926,  and 
1927  acreage  changes.  In  other  words,  their  ’effect  in  these  three  years  may 
be  considered  tentatively,  as  constant.  Consequently,  we  may  draw  a line  or 
curve  through  the  1922,  1926  and  1927  observations  in  Section  1,  thus  obtain- 
ing for  that  section  of  the  diagram  a partial  indication  of  the  nature  of 
the  net  curve  we  are  seeking.  We  note  next  that  the  prices  of  1921  and  1923 
are  relatively  high,  the  1921  price  being  higher  than  the  1923  price.  Inas- 
much as  the  1921  price  is  greater  than  the  1923  price,  it  is  to  be  expected 
that  its  influence  on  the  1923  acreage  may  be  greater  then  that  of  the  1923 
price  on  the  1925  acreage.  By  connecting  the  1923  and  1925  observations  and 
allowing  the  curve  to  remain  below  the  1923  point  because  of  the  greater  in- 
fluence just  referred  to,  we  obtain  another  indication  of  the  nature  of  the 


l/  See  Mimeographed  Report,  Part  1 on  Applications  of  a Simplified  Method  o! 
Graphic  Curvilinear  Correlation. 
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net  curve  or  regression  for/  another  portion  of  the  scatter  diagram*  With 
these  two  lines  drawn  in,:'  it  now  ‘becomes  obvious  that  the  1324  acreage  is 
below  that  indicated  by  the  line  passing  through  1922-26-27,  because  the 
1922  price  was  low. 

The  two  segments  drawn  in  so  far  may  be  taken  to  represent  the  rela- 
tion between  price  one  year  preceding  and  acreage  with  prices  two  years  pre- 
ceding held  constant  respectively  at  the  low  prices  of  1920,  1924,  and  1926, 
and  at  the  high  prices  of  1921  and  1923*  They  indicate  that  the  slope  of  the 
curve  for  Section  1 rises  sharply  when  prices  one  year  preceding  range  be- 
tween $12.00  and  $16.00,  and  that  for  higher  prices  the  curve  slopes  upward 
very  moderately.  Using  .these  two  segment©  as  guides,  we  may  draw  a continuous 
free  hand  curve,  as  the  first  approximation  to  the  net  relation  of  price  one 
year  preceding  to  acreage'  changes.  The  solid  curve  shown  in  Section  1 is 
practically  the  first  approximation. 

Now  if  the  first  approximation  curve  in  Section  1 represents  the 
acreage  changes  that  may  be  attributed  to.,  or  associated  with  the  price  one 
year  preceding,  the  amounts  of  acreage  change  above  or  below  that  curve  for 
the  years  shown  may  be  assumed  to  be  due  to  the  influence  of  the  second 
factor  under  consideration,  namely,  the  price  two  years  preceding.  We  there- 
fore proceed  to  rela,te  the  price  two  years  preceding  to  those  portions  of 
acreage  changes  not' already  attributed  to  the  other  price.  This  is  most  con- 
veniently done  by  measuring  off  or  reading  directly  the  differences  between 
the  observations  and  the  curve  in  Section  1,  and  plotting  those  differences 
against  price  two  years  preceding  in  Section  2 of  Figure  10.  All  of  the  ob- 
servations in  Section  2 are  found  to  lie  along  a fairly  well  defined  curve 
(except  1921)  and  a free  hand  curve  drawn  through  them  may  be  taken  as  the 
first  approximation  to  the  relation  between  prices  two  years  preceding  and 
acreage  changes*  * 

Inasmuch  as  the  observations  in  Section  2 show  so  little  scatter  about 
the  first  approximation  curve,  both  of  the  curves,  in  Section  1 and  in 
Section  2,  may  bo  taken  as  final.  In  cases  where  the  scatter  is  wide,  it  is 
necessary  to  test  the  validity  of  the  first  approximations . This  can  be 
done  by  measuring  the  amounts  that  the  observations  in  Section  2 are 
above  or  below  the  curve  first  approximated  in  2 and  plotting  these  deviations 
above  or  below  the  curve  in  1,  If  these  deviations,  transferred  from  Sec- 
tion 2 to  the  first  approximation  in  1,  group  themselves  at  any  point  con- 
sistently above  or  below  the  ‘first  approximation  curve  in  1,  the  curve  may 
be  altered  so  as  to  reduce  the  deviations  at  ••  that  point.  This  gives  a 
second  approximation  for  Section  1,  Deviations  from  this  second  approximation 
are  then  plotted  around  the  curve  in  Section  2 and  the  curve  there  adjusted 
if  the  new  residuals  suggest  it.  This  gives  a second  approximation  curve 
for  Section  2.  If  necessary  this  process  is  repeated  until  the  residuals 
are  reduced  to  'a  minimum. 

Section  4 of  Figure  10  shows  the  usual  comparison  between  the  actual 
acreage  changes  and  those  estimated  from  the  two  price-acreage  curves. 

. - 2 - 


Case  IX.  The  Halation  of  Price  to  Changes  in  the  Humber  of 
Hogs  on.  Paras  in  the  United  States  on  Jn  ..vary  1 

This  problem  is  similar  to  the  preceding  one  in  that  the  dependent 
variable,  changes  in  the  number  of  hGgs  on  farms,  is  here  taken  as  absolute 
first  differences  or  changes  from  the  numbers  on  farms  on  the  preceding 
January  1,  and  in  that  the  independent  variables  are  two  price  factors,  one 
being  the  corn-hog  ratio  for  the  first  12  months  period  preceding  January  1 
and  the  other,  the  corn-hog  ratio  for  the  second  preceding  12-month  period. 

The  method  of  determining  the  curves  for  each  of  those  price  influences  is 
similar  to  that  shown  for  cabbage  acreage  in  Case  VIII,  There  is,  however, 
one  important  difference,  namely,  that  for. the  period  under  consideration, 
1920-1929,  there  appears  to  have  been  a downward  trend  in  the  relation  between 
the  corn-hog  ratio,  and  the  number  of  hogs  on  farms.  In  presenting  this  pro- 
blem, therefore,  we  shall  refer  only  to  this  additional  factor  and  indicate 
how  its  presence  may  be  detected  and  its  influence  held  constant  in  deter- 
mining the  relation  of  the  other  factors  to  the  dependent  variable. 

In  Section  1,  Figure  II,  is  shown  the  final  approximation  of  the  rela- 
tion of  the  corn-hog  ratio  in  the  first  preceding  year  on  changes  in  hog 
numbers;  in  Section  2,  the  final  approximation  for  the  corn-hog  ratio  the 
second  12-month  period  preceding;  and  in  Section  3,  the  trend  in  the  relation- 
ship, or  stated  differently,  the  trend  in  changes  in  hog  numbers,  not  attri- 
buted to  or  associated  with  the  two  corn-hog  price  series.  Ue  need  to  note 
only  Sections  2 and  3,  Having  drawn  the  curve  in  Section  1 by  a procedure 
similar  to  that  already  described  in  the  preceding  illustration  and  then 
having  ^dotted  in  Section  2 the  residuals  from  the  curve  in  Section  1,  the 
problem  is  to  draw  the  first  approximation  curve  for  the  effect  of  the  corn- 
hog  ratio  two  years  preceding.  Here  it  is  found  that  the  observations  do 
not  fall  along  a well  defined  curve  and  drawing  the  first  approximation  curve 
is  not  as  simple  as  it  was  in  Case  VIII  (Cabbage  acreage  changes  and  price  two 
years  preceding.)  An  inspection  of  the  observations  reveals  first  that  the 
relationship)  is  probably  positive,  that  is,  that  the  curve  rises  with  higher 
corn-hog  ratios,  as  in  Section  1,  It  is  next  observed  that  any  upward  sloping 
line  that  may  be  drawn  through  the  observations  would  leave  those  for  the 
earlier  years  in  the  series  above  the  line  and  those  for  the  later  years, 
below  the  line,  indicating  the  presence  of  a downward  trend  that  may  bo 
associated  with  time.  Thus,  the  problem  becomes  one  of  three  independent 
variables  and  time,  the  third  variable,  must  be  taken  into  account  and  its 
influence  held  constant  in  determining  the  nature  of  the  relation  between  the 
corn-hog  ratios  and  changes  in  hog  numbers. 

The  method  of  holding  time  constant,  in  determining  the  first  approxi- 
mation in  Section  2,  is  indicated  by  the  dashed  lines.  The  process  is  simply 
to  connect  the  observations  in  chronological  sequence,  bearing  in  mind  that 
if  the  trend  factor  is  continuously  downward,  the  connecting  linos  should  not 
cross,  but  should  fall  in  descending  order  (or  ascending  order  where  the  trend 
is  upward).  h¥hen  the  observations  in  Section  2 are  so  connected  the  general 
nature  of  the  relation  of  the  second  corn-hog  ratio  to  hog  numbers  is  suffi- 
ciently obvious,  and  a first  approximation  may  be  made  which  is  not  materially 
different  from  the  final  one  (shown  in  the  solid  line).  The  carve  shown  here 
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has  heen  arbitrarily  placed  so  as  to  ’ , .-w  r.ost  of  the  downward  trend  in 
residuals  prior  to  1927.  It  could  of'  course  havo  been  placed  higher,  but 
the  effect  of  that  would  have  been  merely  to  lower  the  curve  in  Section  3 
in  relation  to  the  zero  line  in  Section  3. 

The  next  step  in  this  type  of  problem,  given  a trend  factor,  is  to 
plot  in  Section  3 the  deviations  from  the  first  approximation  in  Section  2, 
and  to  pass  through  them  a.  line  of  best  fit.  To  test,  finally,  the  goodness 
of  fit  of  the  three  curves  so  developed,  residuals  from  the  trend  line  should 
be  plotted  as  deviations  from  the  other  two  curves  in  the  usual  manner. 

For  further  applications  of  this  method  and  a discussion  of  Cases 
VIII  and  IX  the  reader  is  referred  to  the  Journal  of  Farm  Economics , July 
1929,  "The  Farmers’  Response  to  Price"  by  the  author. 

Case  X.  The  'Relation  of  Three  Weather  Factors  to 
Wheat  Yields  in  State  "X" 

In  this  final  illustration  our  object  is  to  apply  the  simplified 
correlation  method  to  a yield  problem  by  correlating  three  weather  factors, 
rainfall,  snow  cover  and  temperature  with  the  yield  of  wheat  in  a certain 
State  for  a selected  period  of  ten  years.  The  purpose  of  this  illustration 
is  not  so  much  to  present  the  nature  of  the  relation  of  each  of  these  factors 
to  yield,  but  rather  to  indicate  how  the  simplified  method  may  be  applied  to 
complicated  yield  problems,  the  analyses  of  which  ordinarily  consume  a great 
deal  of  time.  The  weather  factors  used  are  (l)  rainfall  during  February, 
March  and  April,  (2)  a measure  of  snow  cover  (the  number  of  days  of  one  inch 
or  more  of  snow  on  the  ground)  and  (3)  average  temperature  in  March  and  April. 
The  years  have  been  numbered  1 to  10  inclusive,  l/ 

The  procedure  followed  in  this  illustration  is  practically  identical 
with  that  described  under  Case  VII  in  part  I of  this  report,  except  for  a 
slight  modification  in  the  device,  used  to  find  sets  of  observations  in  which 
the  influence  of  two  factors  appear  to  be  approximately  equal  in  order  to 
obtain  the  first  approximation  of  the  influence  of  the  third  factor  on  yield. 

The  first  step,  shown  in  Secttei  1 of  Figure  12,  is  to  plot  yield 
against  one  of  the  independent  variables,  (rainfall)  and  then  to  study  the 
variations  in  the  other  two  variables  so  as  to  obtain  a first  approximation 
curve  for  Section  1.  Instead  of  plotting  the  two  dependent  factors  conse- 
cutively, as  was  done  in  Figure  9 (cases  VII),  we  make  use  of  a scatter  dia- 
gram (See  Section  2 of  Figure  12)  with  temperature  plotted  against  snow  cover. 
Inspecting  this  scatter  diagram  for  two  or  more  observations  in  which  these 
two  factors  may  be  assumed  to  have  approximately  equal  values,  we  note  that 
(a)  observations  8 and  9 have  relatively  low  temperature  and  low  snow  cover 


l/  The  data  for  observations  1-9  inclusive  used  in  this  illustration  were 
supplied  by  hr.  S.  H.  Newell  of  the  Division  of  Crop  and  Livestock  Estimates, 
who  has  successfully  used  these  factors  in  forecasting  the  yields  of  the  past 
two  seasons.  Data  for  observation  number  10  are  based  on  preliminary  curves. 


values,  (1)  observations '4  and  '5  have  relatively  low  temperature,  hut  great- 
er snow  cover  values,  and  (c).  observations  3 and  6 have  average  or  better 
than  average  temperatures , and  still  greater  amounts  of  snow  cover.  The 
sets  of  observations  in  Section  1 comparable  to  these  may  now  be  inspected 
to  obtain  suggestions  of  the  nature  of  the  influence  of  rainfo.ll.  It  should 
be  observed  that  none  of  the  sets  of  observations  in  Section  2 contained 
equal  values,’ (for  instance,  8 has  higher  temperature  and  lower  snow  cover 
than  S).  Consequently  the  dotted  lines  in  Section  1 do  not  connect  the 
two  observations  in  each  s et,  but  they  nevertheless  suggest  the  slope  and 
shape  of- the  first  approximation  curve. 

The  next  step  is  shown  in  Section  3 of  Figure  12,  where  deviations 
from  the  curve  in  Section  1 are  plotted  against  snow  cover.  The  shape  of 
the  first  approximation  curve  is  here  revealed  by  connecting  the  observa- 
tions in  the  order  of  the  values  of  X3,,  the  factor  that  here  needs  to  be 
held  constant.  Ifote  that  observations  7,  5,  9,  1,  4 and  8 are  connected  in 
sequence  in  the  order  given,  because  the  corresponding  values  of  Xg  are  44.0 
45.5,  45.9,  47.0,  47.5,  47.7.  The  other  observations  are  also  connected  in 
sequence,  3,  5,  10,  2,  for  the  corresponding  values  of  Xg  are  49,  50, ,54,  56. 
Each  set  of  dashed  lines  suggest  the  first  approximation  free  hand  curve  shown 
in  Section  3.  Deviations  from  the  free  hand  curve  in  Section  3 are  then 
plotted  in  Section  4 against  temperature,  and  a first  approximation  curve 
drawn  through  them. 

To  test  the  validity  of  the  first  approximation  curves  in  Sections  1, 

3,  and  4,  it  is  necessary  to  transfer  the  deviations  about  the  curve  in  4 to 
each  of  the  other  curves.  This  may  be  accomplished  qg^^^^ting  the  throe 
preliminary  curves  in  Sections  5,  6,  and  7 and  then  pTo^Pg  deviations  from 
Section  4 as  deviations  about  the  first  approximation  for  Xq  Xg  in  Section  5. 
This  process  suggests  a somewhat  steeper  slope  for  the  curve  Xq  Xg.  Conse- 
quently a second  approximation  curve  is  drawn.  But  inasmuch  as  the  second 
curve  still  shows  deviations  to  be  accounted  for,  .these  need  to  be  trans- 
ferred to  the  Xq  X4  first  approximation  in  Section  6.  Here  too,  the  deviations 
suggest  a slight  change  in  the  preliminary  curve,,  namely,  raising  it  for  low 
values  of  X4  and  lowering  it  for  high  values,  as  shown  by  the  second  approxima- 
tion. Finally,  the  deviations  about  the  second  approximation  in  Section  6- 
are  transferred  to  the  first  approximation  in  Section  7,  Xq  Xg  and  the  latter 
modified  slightly  to  reduce  the  residuals  numbered  1 and  3. 

By  this  simple  process  three  net  curves,  Xq  Xg,  Xq  X4  and  Xq  Xg' are 
developed  v/hich  account  for  practically  all  of  the  variations  in  yields  for 
the  10-year  period  under  examination,  except,  about  1 bushel  in  the  year' 
marked  "3". 

If  from  this  point  on  it  is  desired  to  compute  correlation  and  deter- 
mination coefficients,  the  usual  procedure  nay  be  followed  by  treating  the 
deviations  from  the  second  approximation  curve  Xq  X3  in  Section  7 as  final 
residuals  from  which  to  compute  the  standard  deviation  required  for  the  index 
of  correlation  formula. 
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Date  used  in  Cases  VIII  and  IX 


Year 

Case 

TrTT  r 

V .L  1 X 

Case 

r-  i 

HI 

Average  price 
per  ton  of 
cabbage  received 
by  producers  l/ 

: Yea r 1 y c hange s 

: in  United  States 
: cabbage  acreage 

t 

• 

Corn-hog 
ratio  2/ 

Changes  in 
number  of  hogs 
on  farms 
January  1 

Dollars 

; 1,000  acres 

Bushel 

Millions 

1919 

19.58 

10.3 

1920 

16.26 

■i*  27.6 

9.8 

- 3.84 

1921 

28.52 

- 19.2 

14.0 

- 1.36 

1922 

- 12.94 

* 29. 2 

14.4 

* .96 

1923 

23.28 

- 28.9 

S.O 

* 9.48 

1924 

16.04 

14. 2 

8.2 

- 2.68 

1325,.... 

17.02 

* .9 

11.5 

- 10.79 

1926 

19.03 

* 9.3 

16.9 

- 3e42 

1927 

15.97 

* 14.5 

12.7 

* 2.64 

1928 

24.43 

- 7.0 

9.9 

* 5.63 

1 QPQ 

-1*  -j  Cj  -J  . . e . . 

-- 

— 

- 5.46 

l/  Adjusted  for  changes  in  crop  year  index  of  farm  prices,  1927-28  «-100. 
2/  Farm  price  of  hogs,  per  hundred?/ eight  divided  by  farm  price  of  com  per 
"bushel , calendar  year  average, 

«8S^  w 

Data  used  in  Case  X 


Year 

Sain fall  l/ 

: Tenperature  2/ 

• 

• 

: Index  of  : 

: snow  cover  J2>/  : 

• e 

• ♦ 

Yield 

Inches 

: Degrees 

! Days  : 

Bushel  s 

1 

7.0 

47.0 

33 

16.3 

2 .... 

3.5 

56.0 

10 

14.0 

3 ♦ . . . 

4.8 

43.0 

30 

16.5 

4 .... 

6.4 

47.5 

18 

19.3 

5 .... 

8.5 

45.5 

20 

15.5 

6 

1.4 

50.0 

33 

20.8 

7 .... 

3.0 

44.0 

24 

22.6 

8 .... 

5.3 

47.7 

8 

17.5 

9 .... 

6.6 

45. 9 

13 

16.5 

10  .... 

9.0 

54.0 

8 

7.0 

l/  During  February,  March  and  April. 

2/  Average  for  1 larch  and  April, 

3/  Humber  of  days  of  one  inch  or  more  of  snow  on  ground. 
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CHANGES  IN  U.  S.  CABBAGE  ACREAGE,  1921-1928 
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Hogs  on  farms-,  changes  in  number  Jan. I to  Jan.  I 
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